Nonparametric Depth-Based Multivariate Outlier Identifiers, and Robustness Properties

نویسندگان

  • Xin Dang
  • Robert Serfling
چکیده

In extending univariate outlier detection methods to higher dimension, various special issues arise, such as limitations of visualization methods, inadequacy of marginal methods, lack of a natural order, limited scope of parametric modeling, and restriction to ellipsoidal contours when using Mahalanobis distance methods. Here we pass beyond these limitations via an approach based on depth functions, which order multidimensional data points by “outlyingness” measures and generate contours following the shape of the data set. This approach to multivariate outlier detection is nonparametric and, with typical choices of depth function, robust. For depth-based outlier identifiers, we define masking and swamping breakdown points, adapting ideas of Davies and Gather (1993) and Becker and Gather (1999) given for parametric outlier identification with normal models. The values of these robustness measures are established for three important depth functions, the spatial, the projection, and a generalized Tukey. Comparison is made with procedures of Maronna and Yohai (1995) and Becker and Gather (1999) using a 5-dimensional data set treated by them. Our three depth-based procedures, and one of Maronna and Yohai (1995) using the StahelDonoho multivariate location and dispersion estimators, are closely competitive and superior to the others, with perhaps the spatial and projection depth procedures having a slight edge. AMS 2000 Subject Classification: Primary 62G10 Secondary 62H99.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonparametric Depth-Based Multivariate Outlier Identifiers, and Masking Robustness Properties

In extending univariate outlier detection methods to higher dimension, various issues arise: limited visualization methods, inadequacy of marginal methods, lack of a natural order, limited parametric modeling, and, when using Mahalanobis distance, restriction to ellipsoidal contours. To address and overcome such limitations, we introduce nonparametric multivariate outlier identifiers based on m...

متن کامل

A numerical study of multiple imputation methods using nonparametric multivariate outlier identifiers and depth-based performance criteria with clinical laboratory data

It is well known that if a multivariate outlier has one or more missing component values, then multiple imputation methods tend to impute non-extreme values and make the outlier become less extreme and less likely to be detected. In this paper, nonparametric depthbased multivariate outlier identifiers are used as criteria in a numerical study comparing several established methods of multiple im...

متن کامل

Survey on (Some) Nonparametric and Robust Multivariate Methods

Rather than attempt an encyclopedic survey of nonparametric and robust multivariate methods, we limit to a manageable scope by focusing on just two leading and pervasive themes, descriptive statistics and outlier identification. We set the stage with some perspectives, and we conclude with a look at some open issues and directions. A variety of questions are raised. Is nonparametric inference t...

متن کامل

Chapter 1 OUTLIER DETECTION

Outlier detection is a primary step in many data-mining applications. We present several methods for outlier detection, while distinguishing between univariate vs. multivariate techniques and parametric vs. nonparametric procedures. In presence of outliers, special attention should be taken to assure the robustness of the used estimators. Outlier detection for data mining is often based on dist...

متن کامل

depth-based nonparametric multivariate analysis and its application in review of new treatment methodology on osteoarthrotic

In this article, first, we introduce depth function as a function for center-outward ranking. Then we present and use half space or Tukey depth function as one of the most popular depth functions. In the following, multivariate nonparametric tests for location and scale difference between two population are expressed by ranking and statistics based on depth versus depth plot. Finally, accord...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006